Advances in Word based Dialect/
نویسنده
چکیده
In an earlier study, we proposed a very effective dialect/accent classification algorithm, which is named Word based Dialect Classification (WDC). The WDC works well for large size corpora and significantly outperforms traditional Large Vocabulary Continuous Speech Recognition (LVCSR) based systems, which is claimed to be the best performing system for language identification. For a small training corpus, however, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a Context Adapted Training (CAT) algorithm is formulated here, which adapts the universal phoneme GMMs to dialect-dependent word HMMs via linear regression. Employing on a 8-dialect British English corpus–IViE, the CAT algorithm trained WDC system obtains a 35.5% relative classification error reduction from the baseline LVCSR system, and a 20.2% relative classification error reduction from the basic WDC system.
منابع مشابه
The Status of [h] and [ʔ] in the Sistani Dialect of Miyankangi
The purpose of this article is to determine the phonemic status of [h] and [ʔ] in the Sistani dialect of Miyankangi. Auditory tests applied to the relevant data show that [ʔ] occurs mainly in word-initial position, where it stands in free variation with Ø. The only place where [h] is heard is in Arabic and Persian loanwords, and only in the pronunciation of some speakers who are educated and/or...
متن کاملAssimilation of Final Low Back Vowel in Eghlidian Dialect
In this article, the low back vowel /A/ in word-final positions in Eghlidian dialect, one of Persian dialects, is studied. This vowel is represented phonetically as [A], [o] and [@] in different phonetic environments. Therefore many words were collected via interviewing ten native speakers so that these different alternant forms can be accounted for appropriately. Since one of the authors of th...
متن کاملA Study of Inflectional Categories of Noun in Sistani Dialect
The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...
متن کاملThe Use of the Almeida-Braun System in the Measurement of Dutch Dialect Distances
Measuring dialect distances can be based on the comparison of words, and the comparison words should be based on the comparison of sounds. In this research we used an adjusted version of an articulation-based system, developed by Almeida and Braun (1986) for finding sound distances, using the IPA system. For comparison of two pronunciations of a word corresponding with two different varieties, ...
متن کاملDialect Pronunciation Comparison and Spoken Word Recognition
Two adaptations of the regular Levenshtein distance algorithm are proposed based on psycholinguistic work on spoken word recognition. The first adaptation is inspired by the Cohort model which assumes that the word-initial part is more important for word recognition than the word-final part. The second adaptation is based on the notion that stressed syllables contain more information and are mo...
متن کامل